36 research outputs found

    Improvements on automatic speech segmentation at the phonetic level

    Full text link
    In this paper, we present some recent improvements in our automatic speech segmentation system, which only needs the speech signal and the phonetic sequence of each sentence of a corpus to be trained. It estimates a GMM by using all the sentences of the training subcorpus, where each Gaussian distribution represents an acoustic class, which probability densities are combined with a set of conditional probabilities in order to estimate the probability densities of the states of each phonetic unit. The initial values of the conditional probabilities are obtained by using a segmentation of each sentence assigning the same number of frames to each phonetic unit. A DTW algorithm fixes the phonetic boundaries using the known phonetic sequence. This DTW is a step inside an iterative process which aims to segment the corpus and re-estimate the conditional probabilities. The results presented here demonstrate that the system has a good capacity to learn how to identify the phonetic boundaries. © 2011 Springer-Verlag.This work was supported by the Spanish MICINN under contract TIN2008-06856-C05-02Gómez Adrian, JA.; Calvo Lance, M. (2011). Improvements on automatic speech segmentation at the phonetic level. En Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer Verlag (Germany). 7042:557-564. https://doi.org/10.1007/978-3-642-25085-9_66S5575647042Toledano, D.T., Hernández Gómez, L., Villarrubia Grande, L.: Automatic Phonetic Segmentation. IEEE Transactions on Speech and Audio Processing 11(6), 617–625 (2003)Kipp, A., Wesenick, M.B., Schiel, F.: Pronunciation modelling applied to automatic segmentation of spontaneous speech. In: Proceedings of Eurospeech, Rhodes, Greece, pp. 2013–2026 (1997)Sethy, A., Narayanan, S.: Refined Speech Segmentation for Concatenative Speech Synthesis. In: Proceedings of ICSLP, Denver, Colorado, USA, pp. 149–152 (2002)Jarify, S., Pastor, D., Rosec, O.: Cooperation between global and local methods for the automatic segmentation of speech synthesis corpora. In: Proceedings of Interspeech, Pittsburgh, Pennsylvania, USA, pp. 1666–1669 (2006)Romsdorfer, H., Pfister, B.: Phonetic Labeling and Segmentation of Mixed-Lingual Prosody Databases. In: Proceedings of Interspeech, Lisbon, Portual, pp. 3281–3284 (2005)Paulo, S., Oliveira, L.C.: DTW-based Phonetic Alignment Using Multiple Acoustic Features. In: Proceedings of Eurospeech, Geneva, Switzerland, pp. 309–312 (2003)Park, S.S., Shin, J.W., Kim, N.S.: Automatic Speech Segmentation with Multiple Statistical Models. In: Proceedings of Interspeech, Pittsburgh, Pennsylvania, USA, pp. 2066–2069 (2006)Mporas, I., Ganchev, T., Fakotakis, N.: Speech segmentation using regression fusion of boundary predictions. Computer Speech and Language 24, 273–288 (2010)Povey, D., Woodland, P.C.: Minimum Phone Error and I-smoothing for improved discriminative training. In: Proceedings of ICASSP, Orlando, Florida, USA, pp. 105–108 (2002)Kuo, J.W., Wang, H.M.: Minimum Boundary Error Training for Automatic Phonetic Segmentation. In: Proceedings of Interspeech, Pittsburgh, Pennsylvania, USA, pp. 1217–1220 (2006)Huggins-Daines, D., Rudnicky, A.I.: A Constrained Baum-Welch Algorithm for Improved Phoneme Segmentation and Efficient Training. In: Proceedings of Interspeech, Pittsburgh, Pennsylvania, USA, pp. 1205–1208 (2006)Ogbureke, K.U., Carson-Berndsen, J.: Improving initial boundary estimation for HMM-based automatic phonetic segmentation. In: Proceedings of Interspeech, Brighton, UK, pp. 884–887 (2009)Gómez, J.A., Castro, M.J.: Automatic Segmentation of Speech at the Phonetic Level. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 672–680. Springer, Heidelberg (2002)Gómez, J.A., Sanchis, E., Castro-Bleda, M.J.: Automatic Speech Segmentation Based on Acoustical Clustering. In: Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F. (eds.) SSPR&SPR 2010. LNCS, vol. 6218, pp. 540–548. Springer, Heidelberg (2010)Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin Speech Database: Design of the Phonetic Corpus. In: Proceedings of Eurospeech, Berlin, Germany, vol. 1, pp. 653–656 (September 1993)TIMIT Acoustic-Phonetic Continuous Speech Corpus, National Institute of Standards and Technology Speech Disc 1-1.1, NTIS Order No. PB91-5050651996 (October 1990

    Uso de metodologías activas en la implantación de IIP en el grado en informática de la UPV

    Get PDF
    Este artículo describe la experiencia de implantación de la asignatura Introducción a la Informática y la Programación (IIP) de primer curso del Grado en Informática en la Escuela Técnica Superior de Ingeniería Informática (ETSINF) de la Universitat Politècnica de València (UPV), destacando el uso de metodologías activas de enseñanza-aprendizaje que incorporan el trabajo en grupo, el diseño de un método de evaluación acorde con la metodología empleada y la incorporación de herramientas tecnológicas de soporte a la docencia. Adicionalmente, se describen y evidencian los aspectos positivos y negativos de la experiencia tanto desde el punto de vista del profesor como del alumno.SUMMARY: This paper describes the experience of setting up the IIP subject (Introduction to Computer Science and Programming) to new course degrees of the first course at School of Computer Science (ETSINF) in the Universitat Politècnica de València (UPV), pointing out the usage of active learning methodologies based on work group, the design of an evaluation method considering the applied methodology and the integration of technological tools to support teaching. Additionally, the positive and negative aspects of the experience are discussed both from the point of view of the teacher and the student.Peer Reviewe

    ELIRF at MEDIAEVAL 2013: Spoken Web Search Task

    Full text link
    In this paper, we present the systems that the Natural Language Engineering and Pattern Recognition group (ELiRF) has submitted to the MediaEval 2013 Spoken Web Search task. All of them are based on a Subsequence Dynamic Time Warping algorithm and are zero-resources systems.Work funded by the Spanish Government and the E.U. under contract TIN2011-28169-C05 and FPU Grant AP2010- 4193.Gómez Adrian, JA.; Hurtado Oliver, LF.; Calvo Lance, M.; Sanchís Arnal, E. (2013). ELIRF at MEDIAEVAL 2013: Spoken Web Search Task. CEUR Workshop Proceedings. 1042:59-60. http://hdl.handle.net/10251/38157S5960104

    Fretting : review on the numerical simulation and modelling of wear, fatigue and fracture

    Get PDF
    This chapter presents a general background and the state of the art of numerical simulation and modeling of fretting phenomenon in terms of wear, fatigue and fracture. First, an introduction of fretting and its implications is exposed. Second, different methodologies for wear modeling and simulation are described and discussed. Afterwards, fatigue and fracture analysis approaches are revised. To that end, multiaxial fatigue parameters are introduced putting an emphasis on the physical basis of the fretting phenomena and the suitability of each model. On the other hand, the propagation phase based on linear elastic fracture mechanics (LEFM) via the finite element method (FEM) and the eXtended finite element method (X-FEM) analysis methods is presented and compared. Finally, different approaches and latest developments for fretting fatigue lifetime prediction are presented and discussed

    A phonetic-based approach to query-by-example spoken term detection

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-41822-8_63Query-by-Example Spoken Term Detection (QbE-STD) tasks are usually addressed by representing speech signals as a sequence of feature vectors by means of a parametrization step, and then using a pattern matching technique to find the candidate detections. In this paper, we propose a phoneme-based approach in which the acoustic frames are first converted into vectors representing the a posteriori probabilities for every phoneme. This strategy is specially useful when the language of the task is a priori known. Then, we show how this representation can be used for QbE-STD using both a Segmental Dynamic Time Warping algorithm and a graph-based method. The proposed approach has been evaluated with a QbE-STD task in Spanish, and the results show that it can be an adequate strategy for tackling this kind of problemsWork partially supported by the Spanish Ministerio de Economía y Competitividad under contract TIN2011-28169-C05-01 and FPU Grant AP2010-4193, and by the Vic. d’Investigació of the UPV (PAID-06-10)Hurtado Oliver, LF.; Calvo Lance, M.; Gómez Adrian, JA.; García Granada, F.; Sanchís Arnal, E. (2013). A phonetic-based approach to query-by-example spoken term detection. En Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer Verlag (Germany). 8529:504-511. https://doi.org/10.1007/978-3-642-41822-8_63S5045118529Anguera, X., Macrae, R., Oliver, N.: Partial sequence matching using an unbounded dynamic time warping algorithm. In: ICASSP, pp. 3582–3585 (2010)Hazen, T., Shen, W., White, C.: Query-by-example spoken term detection using phonetic posteriorgram templates. In: ASRU, pp. 421–426 (2009)Zhang, Y., Glass, J.: Unsupervised spoken keyword spotting via segmental DTW on gaussian posteriorgrams. In: ASRU, pp. 398–403 (2009)Akbacak, M., Vergyri, D., Stolcke, A.: Open-vocabulary spoken term detection using graphone-based hybrid recognition systems. In: ICASSP, pp. 5240–5243 (2008)Fiscus, J.G., Ajot, J., Garofolo, J.S., Doddingtion, G.: Results of the 2006 spoken term detection evaluation. In: Proceedings of ACM SIGIR Workshop on Searching Spontaneous Conversational, pp. 51–55 (2007)Metze, F., Barnard, E., Davel, M., Van Heerden, C., Anguera, X., Gravier, G., Rajput, N., et al.: The spoken web search task. In: Working Notes Proceedings of the MediaEval 2012 Workshop (2012)Gómez, J.A., Castro, M.J.: Automatic segmentation of speech at the phonetic level. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SSPR & SPR 2002. LNCS, vol. 2396, pp. 672–680. Springer, Heidelberg (2002)Gómez, J.A., Sanchis, E., Castro-Bleda, M.J.: Automatic speech segmentation based on acoustical clustering. In: Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F. (eds.) SSPR & SPR 2010. LNCS, vol. 6218, pp. 540–548. Springer, Heidelberg (2010)Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Marino, J., Nadeu, C.: Albayzin speech database: Design of the phonetic corpus. In: Third European Conference on Speech Communication and Technology (1993)Park, A., Glass, J.: Towards unsupervised pattern discovery in speech. In: ASRU, pp. 53–58 (2005)Kullback, S.: Information theory and statistics. Courier Dover Publications (1997)MAVIR corpus, http://www.lllf.uam.es/ESP/CorpusMavir.htm

    Uso de metodologías activas en la implantación de IIP en el Grado en Informática de la UPV

    Get PDF
    Este artículo describe la experiencia de implantación de la asignatura Introducción a la Informática y la Programación (IIP) de primer curso del Grado en Informática en la Escuela Técnica Superior de Ingeniería Informática (ETSINF) de la Universitat Politècnica de València (UPV), destacando el uso de metodologías activas de enseñanza-aprendizaje que incorporan el trabajo en grupo, el diseño de un método de evaluación acorde con la metodología empleada y la incorporación de herramientas tecnológicas de soporte a la docencia. Adicionalmente, se describen y evidencian los aspectos positivos y negativos de la experiencia tanto desde el punto de vista del profesor como del alumno

    Modelos de la teoría de grafos aplicados a problemas de competiciones de programación

    Full text link
    [EN] The subject of Algorithms for Problem Solving of the degree of Computer Science Bachelor’s Degree of the ETSINF is geared towards the solution of programming challenges that are usually taken from programming competitions, such as the Southwestern Europe Regional Contest (SWERC), where students from the ETSINF have been regularly participating for the last years. The solution of such a problem is obtained by building a suitable mode for it, finding the optimal solution via this model, and being able of programming it without bugs in a short period of time. The skill on the solution of these problems is very much taken into account in recruiting processes of big technological companies such as Google, Apple, Yahoo, Microsoft or Facebook. We show a collaboration between two subjects of this degree: Algorithms for Problem Solving (CP) and Graphs, models, and applications (GMA). This collaboration was proposed by students who had taken both subjects simultaneously. The goals consist on redirect part of the contents of GMA to the analysis of models that usually appear in this type of problems, and to facilitate that students can face this challenges. The methodology consists on raising several problems from the point of view of both subjects. The first impressions concerning the innovation are positive[ES] La asignatura Competicion de Programacion del Grado de Ingeniera Informatica esta orientada a la resolucion de desafos de programacion quese suelen proponer en competiciones como la Southwestern Europe RegionalContest (SWERC), en la que alumnos de la ETSINF llevan participandoasiduamente durante los ultimos a~nos. Para obtener la solucion deun problema de este tipo se necesita hacer una modelizacion adecuada delmismo, as como hallar una solucion optima por medio del modelo y sercapaz de programarla sin errores en un corto espacio de tiempo. La habilidaden la resolucion de dichos problemas se tiene muy en cuenta en losprocesos de seleccion de personal de grandes compa~nas tecnologicas comoGoogle, Apple, Yahoo, Microsoft o Facebook.Mostramos una colaboracion entre dos optativas de este grado: Competicion de Programacion (CP) y Grafos, Modelos y Aplicaciones (GMA).Esta colaboracion fue propuesta por alumnos que cursaban ambas asignaturassimultaneamente. Los objetivos consisten en reorientar parte de loscontenidos de GMA al analisis de modelos que suelen aparecer con frecuenciaen problemas de competiciones de programacion, facilitando aslos estudiantes para estos afrontar estos desafos. La metodologa consisteen plantear varios problemas desde la optica de ambas asignaturas. Lasprimeras valoraciones de la innovacion son positivas.Proyecto financiado por la Universitat Polit`ecnica de Val`encia. PIME-B08Jordan Lluch, C.; Gómez Adrian, JA.; Calvo Lance, M.; Conejero Casares, JA. (2016). Modelos de la teoría de grafos aplicados a problemas de competiciones de programación. En In-Red 2016. II Congreso nacional de innovación educativa y docencia en red. Editorial Universitat Politècnica de València. https://doi.org/10.4995/INRED2016.2016.4327OC

    An algorithm for automatic speech understanding over word graphs

    Get PDF
    [ES] En este trabajo se propone un algoritmo para la comprensión automática del habla que toma como entrada un grafo de palabras. Este grafo es procesado en primer lugar mediante un algoritmo de programación dinámica, obteniendo como resultado un segundo grafo enriquecido con información semántica. El cálculo del mejor camino sobre este segundo grafo permite obtener la secuencia de conceptos más verosímil de acuerdo con la evidencia acústica re¿ejada en el grafo de palabras. También como resultado de la decodi¿cación semántica se obtiene la secuencia de palabras asociada a dicha secuencia de conceptos, así como la segmentación semántica de la secuencia de palabras.[EN] : In this work we propose an algorithm for automatic speech understanding that takes a word graph as its input. First, this word graph is processed by means of a dynamic programming algorithm which gives as a result a second graph that includes semantic information. Computing the best path over this second graph allows us to obtain the most likely concept sequence, given the acoustic evidence re¿ected on the input word graph. As a result of the semantic decoding, the word sequence attached to the concept sequence as well as its semantic segmentation are also obtained.Calvo Lance, M.; Gómez Adrian, JA.; Sanchís Arnal, E.; Hurtado Oliver, LF. (2012). Un algoritmo para la comprensión automática del habla sobre grafos de palabras. PROCESAMIENTO DEL LENGUAJE NATURAL. (48):105-112. http://hdl.handle.net/10251/28874S1051124

    Multimodal dialog system based on statistical models

    Get PDF
    En este trabajo presentamos un sistema de diálogo multimodal. Además de la multimodalidad de entrada y salida, la principal característica del sistema es que los módulos más importantes están basados en modelos estadísticos.In this paper, we present a multimodal dialog system. In addition to input and output multimodality, the main feature of the system is that its key modules are based on statistical models.Trabajo parcialmente subvencionado por el gobierno español con el proyecto TIN2008-06856-C05-02 y la Universitat Politècnica de València con el proyecto 20100982

    ConvDTW-ACS: Audio Segmentation for Track Type Detection During Car Manufacturing

    Full text link
    This paper proposes a method for Acoustic Constrained Segmentation (ACS) in audio recordings of vehicles driven through a production test track, delimiting the boundaries of surface types in the track. ACS is a variant of classical acoustic segmentation where the sequence of labels is known, contiguous and invariable, which is especially useful in this work as the test track has a standard configuration of surface types. The proposed ConvDTW-ACS method utilizes a Convolutional Neural Network for classifying overlapping image chunks extracted from the full audio spectrogram. Then, our custom Dynamic Time Warping algorithm aligns the sequence of predicted probabilities to the sequence of surface types in the track, from which timestamps of the surface type boundaries can be extracted. The method was evaluated on a real-world dataset collected from the Ford Manufacturing Plant in Valencia (Spain), achieving a mean error of 166 milliseconds when delimiting, within the audio, the boundaries of the surfaces in the track. The results demonstrate the effectiveness of the proposed method in accurately segmenting different surface types, which could enable the development of more specialized AI systems to improve the quality inspection process.Comment: 12 pages, 2 figure
    corecore